Normalization Strategy of Logical Knowledge Representation for Text Document
نویسندگان
چکیده
منابع مشابه
Text Representation for Efficient Document Annotation
In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labellers a tedious and time-consuming work. To reduce the labelling time for single documents we propose to use condensed representations of text documents instead of the full-text document. These condensed representations are key sen...
متن کاملText Type Structure And Logical Document Structure
Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information about the parts of a document. Hence we investigate the automatic categorization of text segments...
متن کاملKnowledge-based derivation of document logical structure
The analysis of a document image to derive a symbolic description of its structure and contents involves using spatial domain knowledge to classify the different printed blocks (e.g., text paragraphs), group them into logical units (e.g., newspaper stories), and determine the reading order of the text blocks within each unit. These steps describe the conversion of the physical structure of a do...
متن کاملUsing an Interlingua for Document Knowledge Representation
In this paper, the authors advocate in favor of using an interlingua for representing the knowledge contained in text documents. The advocated interlingua, UNL, was designed by the United Nations University to support a language independent textual representation to overcome linguistic barriers in Internet. This paper describes the main features of UNL and presents the application of this inter...
متن کاملUsing Background Contextual Knowledge for Document Representation
We describe our approach to document representation that captures contextual dependencies between terms in a corpus and makes use of these dependencies to represent documents. We have tried our representation scheme for automatic document categorisation on the Reuters’ test set of documents. We achieve a precision recall break even point of 84% which is comparable to the best known published re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Engineering and Technology
سال: 2013
ISSN: 1793-8236
DOI: 10.7763/ijet.2013.v5.520